Accelerating a PARSEC Benchmark Using Portable Subword SIMD

نویسندگان

  • Saugata Ghose
  • Shreesha Srinath
  • Jonathan Tse
چکیده

We present a case study of the GNU Compiler Collection (GCC) Vector Extensions in GCC 4.7. In particular, we examine the relative performance of explicit vector code using the GCC Vector Extensions to that of automatically vectorized code from the Intel C++ Compiler (ICC). Our analysis focuses on the interactions between data-level and thread-level parallelism in the streamcluster benchmark from the PARSEC benchmark suite, in particular examining tradeoffs between portability and performance across different vectorization techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating multimedia with enhanced microprocessors

A minimalistic set of multimedia instructions introduced into PA-RISC microprocessors implements SIMD-MIMD parallelism with insignificant changes to the underlying microprocessor. Thus, a software video decoder attains MPEG video and audio decompression and playback at real-time rates of 30 frames per second, on an entry-level workstation. Our general-purpose parallel subword hxstructions can a...

متن کامل

High-performance and Energy-efficient Heterogeneous Subword Parallel Instructions

High instruction throughput and energy efficiency are becoming increasingly important design requirements for embedded and mobile computing systems. This paper presents tlie Quantized Color Pack extension (QCPX) ISA to improve execution performance of multimedia processing applications on programmable superscalar processors while reducing the energy consumption for these applications. QCPX expl...

متن کامل

A Characterization of the PARSEC Benchmark Suite for CMP Design

The shared-memory, multi-threaded PARSEC benchmark suite is intended to represent emerging software workloads for future systems. It is specifically intended for use by both industry and academia as a tool for testing new Chip Multiprocessor (CMP) designs. We analyze the suite in detail and identify bottlenecks using hardware performance counters. We take a systems-level approach, with an empha...

متن کامل

Modeling the Effects on Power and Performance from Memory Interference of Co-located Applications in Multicore Systems

In this study, we analyze interference trends when corunning multiple applications possessing varying degrees of memory intensity on multi-core processors. We conduct tests with PARSEC benchmark applications and explore energy consumption, execution times, and main memory accesses when interfering applications share last-level cache. We also explore how co-running applications are impacted when...

متن کامل

PARSEC Benchmark Suite: A Parallel Implementation on GPU using CUDA

Graphics Processing Units (GPUs) are a class of specialized parallel architectures with tremendous computational power. The Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. In this project, we targets Parsec benchmarks to provide orders of performance speed up and reducing overall execution time on mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011